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^^ield of the Invention 

This invention relates to a system and database for the storage, retrieval, and 

analysis of cellular information. 



Background of the Invention 

Traditionally, cell biology research has largely been a manual, labor intensive 

activity. With the advent of tools that can automate much cell biology experimentation 

10 (see for example U.S. Patent Application SN 08/810,983 filed February 27, 1997, 

incorporated by reference herein in its entirety), the rate at which complex information is 

P generated about the functioning of cells has increased dramatically. As a result, cell 

CD 

U biology is not only an academic discipline, but also the new frontier for large-scale drug 



□ discovery How to present, organize and analyze the complex information about cell 

CO 

15 functioning so that new knowledge can be generated is critical for both pharmaceutical 



research and basic cell biology research. 



[l Current technology uses simple text-based presentation of cellular information, or 



tC presents only a subset of the information associated with a cellular entity or event. This 

invention enables the user to visually interact with all identified dimensions of cellular 
20 information at the same time and dynamically navigate through those dimensions to find 
out the relationship of one piece of information with other pieces of information. This 
process facilitates the abstraction of knowledge from information. The Cellomics™ 
Database uses pathways to capture cellular knowledge. The pathway knowledge is then 
used as a portal to unite all other cellular information, thus enabling the synthesis of new 
25 knowledge by investigating the inner relationship of this information. 



Other related systems* only capture a small subject area while the Cellomics™ 
Database is an open system that integrates a wide variety of cellular information. Current 
technology uses simple text-based presentation of cellular information, or presents only a 
subset of the information associated with a cellular entity or event 
* Including 

1) Ecocyc from Pangea (Exemplified in Nucleic Acids Research 26:50-53 (1 998); Ismb 
2:203-211 (1994) 

2) KEGG pathway database from Institute for Chemical Research, Kyoto University 
(Nucleic Acids Research 27:377-379 H999): Nucleic Acids Research 77-79-^i 
(1999)) 

3) CSNDB from Japanese National Institute of Health Sciences (Pac Symp. Biocomput 
187-197 (1997) 

4) SPAD from Graduate School of Genetic Resources Technology, Kyushu University, 
Japan 

5) PUMA 

fatW/ww w-c. rocs, apl.gov/bome/coira FTT Ph i M mP fr°na 

Computational Biology in the Mathematics and Computer Science Division at Argonne National 
Laboratory. 



The present invention enables the user to visually interact with all identified 
dimensions of cellular information at the same time and dynamically navigate through 
those dimensions to find out the relationship of one piece of information with other 
pieces of information. This process facilitates the abstraction of knowledge from 
information. 



This user interface has the following key features: 

1) Dynamic generation of pathway diagrams to represent cellular functions (see 
Figure 1). 

2) The diagrams of feature 1 capture the spatial information about each entity in the 
diagram by associating each entity with a specific cellular compartment 



3) The diagram of feature 1 is used as a navigation tool to retrieve information 
associated with certain cellular functions or entities. Information is presented 
hierarchically, from more general to more specific. Color-coding is used to reflect the 
highest level of generalization. 

4) Cellular information is organized into dimensions. Each dimension is organized 
into hierarchies of information. 

5) Every cellular entity has some information associated with it in each dimension as 
defined in feature 4. When an entity is selected in the diagram of feature 1, its 
corresponding information in the dimensions as well as its position in the relevant 
hierarchy is dynamically presented. 

6) Interactive updating of the diagrams of feature 1. Users can selectively expand 
and/or collapse parts of the diagram, or rearrange the layout of the diagram. Updating of 
the diagram can be achieved by making restrictions on some dimensional information, 
such as "only show the entities that have been shown to be functional in certain cell 
types." Updated information can replace the old information or the old information and 
updated information can be presented in different planes using 3-dimensional diagrams or 
in different windows. When they are presented in different planes, the planes can be 
parallel to each other or at an angle (for example, 90 degrees). 

7) This visual presentation is also used as the basis for a cell editor to input 
information about cellular pathways. The user will draw a cellular pathway onto 
predefined cell templates and a program will capture information about that pathway as it 
is drawn. When more information is necessary or when there is ambiguity, a software 
program will prompt the user for clarification. Before committLng the input, a textual 



description will be shown to the user so that the user can confirm that the computer has 
correctly captured his/her intentions while interacting with the graphical interface. The 
user can directly edit the textual information before it is submitted. 

8) As the user interacts with the database through various visualizations described 
above, a history file is kept to record his/her activities. Upon request, a graphical 
representation of these activities can be plotted. 

9) Different shapes are used to represent different types of entities in the diagram of 
feature 1 . 

10) The user can define his/her own diagram from the underlying data in a database. 

11) The user can zoom and pan. 

Cellomics™ Database uses a standard based, platform independent means to 
transmit information (XML) which enables the system to more easily integrate with other 
public domain or proprietary information source. 

This invention can be used to facilitate the user's understanding of cell 
functioning, to design experiments more intelligently and to analyze experimental results 
more thoroughly. Specifically, this invention can help drug discovery scientists select 
better targets for pharmaceutical intervention in the hope of curing diseases. 

This invention is a complete system that enables the easy storage, retrieval, and 
analysis of cellular information. Figure 2 shows the main system components of this 
invention, which includes the Cellomics™ Database itself, the application software that 
runs on the Cellomics™ Database servers and the client machines. Client machines can 
access the Cellomics™ Database through either the Internet or an Intranet. Cellomics™ 



I Database application software will access proprietary databases within a customer site or 
public domain databases through the Internet. Figure 3 shows the general steps in 
interacting with this system. A user can interact with the system in either edit mode or 
query mode. Different users wil] be assigned privileges to either only query Cellomics™ 
Database or both edit and query the database. Appendix A shows data representation 
schema in the form of an XML DTD. 

It should be understood that the programs, processes, methods and databases 
described herein are not related or limited to any particular type of computer or network 
system (hardware or software), unless indicated otherwise. Various types of general 
purpose or specialized computer systems may be used with or perform operations in 
accordance with the teachings described herein. 

In view of the wide variety of embodiments to which the principles of the present 
invention can be applied, it should be understood that the illustrated embodiments are 
exemplary only, and should not be taken as limiting the scope of the present invention. 
For example, the steps of the flow diagrams may be taken in sequences other than those 
described, and more or fewer elements may be used in the block diagrams. While 
various elements of the preferred embodiments have been described as being 
implemented in software, in other embodiments hardware implementations may 
alternatively be used and visa-versa. 
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A DTD for cellular pathway inf ormaion : PML.dtd 
Author (s) Jian Wang --> 
Log: Start: 12/15/5B/JW --> 

-- Copyright: Cellomics, Inc--> 



- - > 



<[ ELEMENT Pathways (Pathway* )> 

<!-- ref defines all the references used in an xml doc. reflink at this level 

references^hat is generaic to the whole pathway. reflink at other levels are 
references 

specific to that level --> 

<! ELEMENT Pathway . , _ , 

( (BioSys | Component | Cell_Comp artment | Cellular_Process | Functional_Unit | Transf ormat 

ions|Feature!nfo|Ref ) *,RefLink?, Notes*) > 



Pathway_ID ID # REQUIRED 
Pathway_Name CDATA #IMPLIED> 

< ! ELEMENT BioSys {Organism? , System? , Organ? , Tissue? , Cell ? , Notes* } > 
< 1 ATTLIST BioSys 

BioSys_ID ID #REQUIRED> 
ff* < ! ELEMENT Organism EMPTY > 

^ < ! ATTLIST Organism 

Organism CDATA # REQUIRED 
DevStage CDATA #IMPLIED> 
< 1 ELEMENT System EMPTY > 
^ £ <! ATTLIST System 

CO System CDATA # REQUIRED 

2 DevStage CDATA #IMPLIED> 

U < ! ELEMENT Organ EMPTY > 

t <! ATTLIST Organ 

£ Organ CDATA #REQtfIRED 

DevStage CDATA *IMPLIED> 
' W <! ELEMENT Tissue EMPTY> 

\ < ! ATTLIST Tissue 

& Tissue CDATA # REQUIRED 

\v DevStage CDATA #IMPLIED> 

\Q <! ELEMENT Cell EMPTY > 

< [ ATTLIST Cell 

Cell CDATA ft REQUIRED 
CellCycleStage CDATA # IMPLIED 
DevStage CDATA #IMPLIED> 



<! ELEMENT Cell_Compartment (# PCDATA | Notes )* > 
c ! ATTLIST Cell_Compartment 

Compartroent_ID ID # REQUIRED 
Compart men t_Name CDATA #R£QUIRED> 



< I ELEMENT Cellular_Process (#PCDATA | Notes) *> 
<! ATTLIST Cellular_Process 

Process_ID ID # RE QUI RED 
Process Name CDATA # REQUIRED? 
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< lATTLIST Component 

Component_ID ID # REQUIRED 
Component_Name CD ATA # REQUIRED 
BioSys IDREF #REQUIRED> 

< ! ELEMENT Modification (#PCDATA | Notes ) *> 
< lATTLIST Modification 

Modif ication_Site CDATA # IMPLIED 
ModificationJType CDATA # REQUIRED > 

< "ELEMENT Functional_Unit ( Component Link* , Synonym*, RefLink?, Notes*)> 
< lATTLIST Functional_Unit 
Unit_ID ID # REQUIRED 
Unit_Name CDATA # REQUIRED 
Unit_Abbr CDATA #IMPLIED 
BioSys IDREF # REQUIRED 
X_Coord CDATA # IMPLIED 

Y_Coord CDATA ^IMPLIED 
Shape (CIRCLE | POLYGON | SQUARE) "CIRCLE " > 

<!-- the following "SimpleLink N points to the ID of a defined component 
or f unctional_unit or cell_compartment or cellular__process . 
The above can be accompished by using IDREF instead of Simple Links . 
However, it may be more extensible using links since we know that the 
component definitions will be on the server somewhere (outside of any specific 
xml doc) in the future. --> 

< ! ELEMENT ComponentLink (SimpleLink, Notes*) > 
< lATTLIST ComponentLink 

NumberOf Component CDATA # IMPLIED 
„ InCompartment IDREF ^REQUIRED 

>U Unif ormlnCompartment (TRUE | FALSE) "TRUE"> 
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?^ < l ELEMENT Synonym (Abbreviation*, Notes*) > 
p.j < lATTLIST Synonym 

Synonym CDATA # REQUIRED > 
^ <! ELEMENT Abbreviation ( # PCDATA | Not e s ) * > 
to < lATTLIST Abbreviation 

W Abbreviation CDATA #REQUIRED> 



<!-- having a RefLink element is for the sole purpose of making the 
xml doc more readable; otherwise one would not know what the extended 
link is all about since the "ExtendedLink" element is reused extensively. 
In this case, the href attribute should point to some defined reference in 
the same xml doc using XPointers: "#ID ()"--> 
<! ELEMENT RefLink (ExtendedLink) > 

< I ELEMENT Ref ( (Publ ication | Person | Organi zat ion) * , Notes*)> 
< lATTLIST Ref 

Ref_ID ID # REQUIRED 
Date_Month CDATA ^IMPLIED 
Date_Day CDATA # IMPLIED 
Date_Year CDATA #IMPLIED> 

<!-- the following simplelink links to a medline record --> 
< ! ELEMENT Publication (Person*, SimpleLink, Note?)> 
< lATTLIST Publication 




Journal CDATA ^IMPLIED 
Publisher CDATA # IMPLIED 
PageStart CDATA ^IMPLIED 
PageEnd CDATA ^IMPLIED 
Volume CDATA # IMPLIED 
Issue CDATA # IMPLIED 
Type CDATA # IMPLIED 
Date_Month CDATA # IMPLIED 
Date_Day CDATA # IMPLIED 
Datejrear CDATA #IMPLIED> 
c! ELEMENT Person (Organization*, Notes*) > 
< ! ATTLIST Person 

FirstName CDATA ^IMPLIED 
Middlelnit CDATA #IMPLIED 
LastName CDATA # IMPLIED 
StreetAddress CDATA # IMPLIED 
City CDATA # IMPLIED 
State CDATA # IMPLIED 
ZipCode CDATA ^IMPLIED 
AreaCode CDATA # IMPLIED 
PhoneNum CDATA # IMPLIED 
Ext CDATA # IMPLIED 
Email CDATA If IMPLIED 
Web CDATA # IMPLIED 

Role CDATA #IMPLIED> <!-- "role" could be "contacting author" for 

example --> 

< ! ELEMENT Organization {#PCDATA| Notes) *> 
< t ATTLIST Organization 

Name CDATA #REQUIRED 

Type (Commercial | Academic | Government) #REQUIR£D> 



<!-- "Role" describes the function of some item in a collection, such as 
limiting" --> 

< J ELEMENT Transformations ( (Transformation | Transformations | Effectors ) *, 

RefLink?, Notes*) > 

<! ATTLIST Transformations 

Trans format ions_ID ID # REQUIRED 
Trans format ions_Type CDATA # IMPLIED 
Trans format ions_Name CDATA ^IMPLIED 
Role CDATA #IMPLIED 

Group Type CDATA #IMPLIED> <!-- such as coupled or simultaneou 
subpathway - - > 

< ! ELEMENT Transformation ( Input* , Output* , Effectors* , Ref Link? , Notes*) > 
< ! ATTLIST Transformation 

Transformation_ID ID # REQUIRED 
Trans format ion_Type CDATA it IMPLIED 
Trans forma tion_Name CDATA It IMPLIED 
Role CDATA #IMPLIED> 

<!-- Input, Output and Effector reference Unit --> 
< ! ELEMENT Input (# PCDATA | Notes) *> 
< 1ATTLIST Input 

Input_ID IDREF #REQUIRED> 
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< ! ATTLIST Output 

Output_ID IDREF #REQUIRED> 
< ! ELEMENT Effectors (Effector*, Notes*) > 
<! ATTLIST Effectors 

Group_Type (synergism | xyz) "synergism" > 
< ! ELEMENT Effector (#PCDATA| Notes) *> 
< i ATTLIST Effector 

Effector_ID IDREF # REQUIRED 
EffectJType CDATA # IMPLIED 
Role CDATA # IMPLIED 

Is_Positive (TRUE | FALSE) "TRUE"> <!-- EffectJType could be enzyme 



- - > 



<!-- Feature-ID references an object of the type specified by FeatureJType --> 
< I ELEMENT Featurelnfo (ExtendedLink, Notes*) > 
< 1 ATTLIST Featurelnfo 

Feature ID IDREF #REQUIRED 

FeatureJType (Component | Unit | Trans format ions ) # REQUIRED 

Info_Type 

(Entity | Assay | Compound | Reference | Pathway | Disease | Credibility) -Entity" > 

£fi <! ELEMENT ExtendedLink (LinkLocator* , Notes*) > 
J m i < ! ATTLIST ExtendedLink 

XML-LINK CDATA # FIXED " EXTENDED " 
ROLE CDATA # IMPLIED 
TITLE CDATA # IMPLIED 
INLINE (TRUE | FASLE) "TRUE" 
^ SH0W (EMBED | REPLACE | NEW) "REPLACE" 

S ACTUATE (AUTO | USER) "USER"> 



s < I ELEMENT LinkLocator ( #PCDATA| Notes ) *> 
3 < 1 ATTLIST LinkLocator 

XML -LINK CDATA # FIXED " LOCATOR " 
ROLE CDATA # IMPLIED 
HREF CDATA # REQUIRED 
™f TITLE CDATA ^IMPLIED 

W SHOW (EMBED | REPLACE | NEW) "REPLACE 

£ ACTUATE (AUTO|USER) "USER" > 

<! ELEMENT SimpleLink (#PCDATA| Notes )* > 
< 1 ATTLIST SimpleLink 

XML-LINK CDATA #FIXED "SIMPLE" 
HREF CDATA # REQUIRED 
TITLE CDATA #IMPLIED> 



< ! ELEMENT Notes (# PCDATA) > 



^ FIM8-99 1 1:20 ProiilCDOMEU BQEHNO HULKIT I KRCHOFF +3129150002 T-459 P.0E/D7 Joe-UJ 




NAME OF CONCERN CELLOMICS, INC. 

H ADDRB6S OF CONCXKN 635 William PhtWmy, Pittsburgh, PA 1523* 
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CD I hereby declare (bat the abovc-ktanftfled small business concern qualifies u t email burineai concern as defined in 
TO 13 CSX. 5 121, tod cffcienrrd m 37 CJJL § 1 .9(d), for purposes of paying reduced feet to the Uatod Sutts Pataot 
and mdeznerk Office, in that The number of employees of the concern, mehidtog those of ib affiliate*, does not 
exceed 500 penonx Far purposes of ibis moment, (1) the number of employ** of the busroess concern it toe 
average over the p revi ous fiscal year of the cancan of the persons employed on a felV-nrne, pett-tone, or temporary 
basil daring each of foe pay periods of the fiscal yen, and (2) conrrrm tie sffiKairg of each other when ckhex, 
directly or indirectly, one concern controls or hai the power to control the other, or a third party or parries controls or 
has the power Id contra] both 

I hereby declare that rights 'under contract or lew have been conveyed Do and remain -with Ibc small bunmnas concern 
identified show wilfe regard to the bryenbon, entitled ^Database For Storage And Retrieval Of Cettalar 
Information by nrvvnterU) Jlse Wans, Chris Harrington, and Lass Taylor. 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

(Atrorr^'iIk^ckctNo. 99fiU) 



Applicant or 



Wtng et il 



Serial or J** « n „ 

PueotNo. Tone assigned Issued. Herewrtt 



TrtU: Database For Storage And Retrieval Of Ceilurar Information 

VERIFIED STATEMENT CLAIMING SMALL ENTITY STATUS 
(3? CTJL § 1.0(f) AND i 131(c)) - SMALL BUSINESS CONCERN 

I hvreby d*cJar* Out I am 



Q the owner of the small rmimrtt crmrrm identified bekrw: 

p [2 an official of the small bostness concern empowered to act on behalf of the rrevern 

^ identified below: 



described in 



the ipcxjfica&art filed herewrth- 

Application Serial No. _ , filed 

Pa cent No. .issued 



- 1 - 



umcm. aoar» 



+31291 30002 
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If the righto held by the above identified snail business concern m not exclusive, each individual concani or 
argantaatka having rights in the invention nun fll* verified statements averring to fbeir stama ai tmall enbbev and 
no righto to ihc mventinB are held by any person, other tan the inventor, who would not qualify as as w*irp*»nw 
inventor Dodar 37 CFH { 1.9(c) if that person made me tnvasfan, or by any concern which would not qQabjfy a* a 
small boaineaa concern under 37 CFR § 13(d), or * nonprofit crrainxaban under 37 CFR $ 15(e). 



Hach pcxaoa. coficcxn or organization having any rights to toe invention is hated below: 



IS No auoh oenocL concern or legauiiarirm exists. 

Q Each such persoa, concern or — gnU«iUft ii listed betar. 

Separate verified statements art required from each named person, concern or organxaban having ri^ho in me 
invention averring to their states as smiTl errtmea. (37 CFR } 127), 

I acknowledge rhe dmy to file, in Una appticaboo or patent ac^fkiation of my change in statu* reruhjDg in )o«t of 
entitlement to small rotaty status prior id paying, or at ftte time of paying the earliest of the issue foe or any 
£m due after fee date on which status as a small entity is no longer appropriate. (37 CJFJL 6 l.Z&fb» 



I beccby d*clsrt that all statements made herein of my own knowledge axe true and that all statements made on 
mforrnarinr and belief arc behaved to be true; and further that these statements wen made with the knowhxlge thai 
C- willful false Statements and the tike so made ire punishable by fine or imprisonment or both, under Section 100} of 
Q Title 18 of the United States Code, and thai each willful false statements may Jeopardize the validity of me 

^ a pplioariim . myptratt issuing thyran. or any patent to which tfau varifiad m tmgm im Ai rr+tP-A 



NAME OF PERSON SIONINO: _j O&USTfc fO Jfc- ■ 

TTTLE IN ORGANIZATION: \Jl C JZYfej&.lh&JT 4 £l±lCF OfP|<l£72. 
ADDRESS OFPBRSPN SIGN1NO: 6»3S" U) VLUArK^TT lOM 



Signature: 
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Server-side applications 
send information to client 
application in XML formate 



Client application parses 
XML and presents results 
graphically for the user to 
interact with 




User modifies XML 
document through graphical 
editor 



Edit 
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